Oaxaca
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Mexico > Oaxaca (0.04)
- North America > United States > Nebraska (0.04)
- (4 more...)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Austria > Vienna (0.14)
- Oceania (0.05)
- (29 more...)
- Law (1.00)
- Government (0.93)
Lost tomb of the mysterious 'cloud people' unearthed after 1,400 years in 'discovery of the decade'
America's fastest-growing state is selling the perfect lifestyle... and everyone's falling for it I was using my vape 160 times a day, it was costing me a fortune and its toll on my face was truly shocking. Then I discovered a miracle one-day cure... and stopped overnight: MARY KILLEN Lost tomb of the mysterious'cloud people' unearthed after 1,400 years in'discovery of the decade' Devastating truth about Blind Side actor Quinton Aaron: More to this'than everyone is letting on', friends reveal... as co-star Sandra Bullock'monitors' situation Harper Beckham, 14, puts on a stylish display in a fluffy coat and vintage Chanel bag as she heads out in Paris with her family... after Nicola's Peltz's heartbreaking comments about sister-in-law America's earthquake hotspot is more dangerous than feared as scientists make surprising discovery Terrifying animation shows pilot's-eye view of DC mid-air collision between airliner and helicopter that killed 67 Explosive twist in'diva' inmate Bryan Kohberger's life in prison revealed in the FREE The Crime Desk newsletter Marco Rubio'cocoons like a mummy' in bizarre strategy to hide naps from Trump Frozen woman who was'stiff as a rock' is found outside Texas convenience store Inside the Super Bowl hotels home to Seattle Seahawks and New England Patriots... where guests complained of cockroaches, loud noise and'being bitten' Lost tomb of the mysterious'cloud people' unearthed after 1,400 years in'discovery of the decade' It has been hailed as'the most significant archaeological discovery in a decade.' Archaeologists in Mexico have uncovered a 1,400-year-old tomb in the Central Valleys of Oaxaca that had been lost to history. The stone structure, built by the Zapotec culture, known as Be'ena'a, or'The Cloud People', is adorned with sculptures, murals and carved symbols that suggest ritual significance. The Zapotec believed their ancestors descended from the clouds and that, in death, their souls returned to the heavens as spirits.
- North America > Mexico > Oaxaca (0.24)
- North America > United States > Texas (0.24)
- North America > Canada > Alberta (0.14)
- (17 more...)
- Transportation (1.00)
- Media > Television (1.00)
- Media > Music (1.00)
- (6 more...)
- North America > United States > Massachusetts (0.05)
- North America > United States > Louisiana (0.05)
- North America > Mexico > Oaxaca (0.05)
- (7 more...)
Morphologically-Informed Tokenizers for Languages with Non-Concatenative Morphology: A case study of Yoloxóchtil Mixtec ASR
This paper investigates the impact of using morphologically-informed tokenizers to aid and streamline the interlinear gloss annotation of an audio corpus of Yoloxóchitl Mixtec (YM) using a combination of ASR and text-based sequence-to-sequence tools, with the goal of improving efficiency while reducing the workload of a human annotator. We present two novel tokenization schemes that separate words in a nonlinear manner, preserving information about tonal morphology as much as possible. One of these approaches, a Segment and Melody tokenizer, simply extracts the tones without predicting segmentation. The other, a Sequence of Processes tokenizer, predicts segmentation for the words, which could allow an end-to-end ASR system to produce segmented and unsegmented transcriptions in a single pass. We find that these novel tokenizers are competitive with BPE and Unigram models, and the Segment-and-Melody model outperforms traditional tokenizers in terms of word error rate but does not reach the same character error rate. In addition, we analyze tokenizers on morphological and information-theoretic metrics to find predictive correlations with downstream performance. Our results suggest that nonlinear tokenizers designed specifically for the non-concatenative morphology of a language are competitive with conventional BPE and Unigram models for ASR. Further research will be necessary to determine the applicability of these tokenizers in downstream processing tasks.
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Europe > Germany > Saxony > Leipzig (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
The Best Home Cocktail Machines--and Whether You Need One
Automatic cocktail machines are silly, but also kind of fun. Here's how to choose between the Bartesian and Barsys devices. The machine on my kitchen table is a holy device, if your definition of "holy" is that it looks like a glowing halo and it's filled with spirits. The machine has taken up a task I consider sacred: making me a cocktail. In advance of holiday party season, I have been testing a pair of devices that promise an indulgent future, a life where machines can make you a passable Old Fashioned.
- North America > Mexico > Oaxaca (0.05)
- North America > United States > California (0.04)
- Europe > Slovakia (0.04)
- Europe > Czechia (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.05)
- North America > Mexico > Oaxaca (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Omnilingual ASR team, null, Keren, Gil, Kozhevnikov, Artyom, Meng, Yen, Ropers, Christophe, Setzler, Matthew, Wang, Skyler, Adebara, Ife, Auli, Michael, Balioglu, Can, Chan, Kevin, Cheng, Chierh, Chuang, Joe, Droof, Caley, Duppenthaler, Mark, Duquenne, Paul-Ambroise, Erben, Alexander, Gao, Cynthia, Gonzalez, Gabriel Mejia, Lyu, Kehan, Miglani, Sagar, Pratap, Vineel, Sadagopan, Kaushik Ram, Saleem, Safiyyah, Turkatenko, Arina, Ventayol-Boada, Albert, Yong, Zheng-Xin, Chung, Yu-An, Maillard, Jean, Moritz, Rashel, Mourachko, Alexandre, Williamson, Mary, Yates, Shireen
Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expanding ASR coverage has been costly and limited by architectures that restrict language support, making extension inaccessible to most--all while entangled with ethical concerns when pursued without community collaboration. To transcend these limitations, we introduce Omnilingual ASR, the first large-scale ASR system designed for extensibility. Omnilingual ASR enables communities to introduce unserved languages with only a handful of data samples. It scales self-supervised pre-training to 7B parameters to learn robust speech representations and introduces an encoder-decoder architecture designed for zero-shot generalization, leveraging a LLM-inspired decoder. This capability is grounded in a massive and diverse training corpus; by combining breadth of coverage with linguistic variety, the model learns representations robust enough to adapt to unseen languages. Incorporating public resources with community-sourced recordings gathered through compensated local partnerships, Omnilingual ASR expands coverage to over 1,600 languages, the largest such effort to date--including over 500 never before served by ASR. Automatic evaluations show substantial gains over prior systems, especially in low-resource conditions, and strong generalization. We release Omnilingual ASR as a family of models, from 300M variants for low-power devices to 7B for maximum accuracy. We reflect on the ethical considerations shaping this design and conclude by discussing its societal impact. In particular, we highlight how open-sourcing models and tools can lower barriers for researchers and communities, inviting new forms of participation. Open-source artifacts are available at https://github.com/facebookresearch/omnilingual-asr.
- North America > Canada > Alberta (0.14)
- Europe > Austria > Vienna (0.14)
- Africa > Sudan (0.14)
- (53 more...)
- Health & Medicine (1.00)
- Education (0.67)
- Information Technology (0.67)
- Africa > Senegal > Kolda Region > Kolda (0.05)
- North America > Mexico > Oaxaca (0.04)
Statistical Properties of Rectified Flow
Mena, Gonzalo, Kuchibhotla, Arun Kumar, Wasserman, Larry
Rectified flow (Liu et al., 2022; Liu, 2022; Wu et al., 2023) is a method for defining a transport map between two distributions, and enjoys popularity in machine learning, although theoretical results supporting the validity of these methods are scant. The rectified flow can be regarded as an approximation to optimal transport, but in contrast to other transport methods that require optimization over a function space, computing the rectified flow only requires standard statistical tools such as regression or density estimation. Because of this, one can leverage standard data analysis tools for regression and density estimation to develop empirical versions of transport maps. We study some structural properties of the rectified flow, including existence, uniqueness, and regularity, as well as the related statistical properties, such as rates of convergence and central limit theorems, for some selected estimators. To do so, we analyze separately the bounded and unbounded cases as each presents unique challenges. In both cases, we are able to establish convergence at faster rates than the ones for the usual nonparametric regression and density estimation.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (6 more...)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.74)